Malcolm is a powerful network traffic analysis tool suite designed with the following goals in mind:
- Easy to use – Malcolm accepts network traffic data in the form of full packet capture (PCAP) files and Zeek (formerly Bro) logs. These artifacts can be uploaded via a simple browser-based interface or captured live and forwarded to Malcolm using lightweight forwarders. In either case, the data is automatically normalized, enriched, and correlated for analysis.
- Powerful traffic analysis – Visibility into network communications is provided through two intuitive interfaces: OpenSearch Dashboards, a flexible data visualization plugin with dozens of prebuilt dashboards providing an at-a-glance overview of network protocols; and Arkime (formerly Moloch), a powerful tool for finding and identifying the network sessions comprising suspected security incidents.
- Streamlined deployment – Malcolm operates as a cluster of Docker containers, isolated sandboxes which each serve a dedicated function of the system. This Docker-based deployment model, combined with a few simple scripts for setup and run-time management, makes Malcolm suitable to be deployed quickly across a variety of platforms and use cases, whether it be for long-term deployment on a Linux server in a security operations center (SOC) or for incident response on a Macbook for an individual engagement.
- Secure communications – All communications with Malcolm, both from the user interface and from remote log forwarders, are secured with industry standard encryption protocols.
- Permissive license – Malcolm is comprised of several widely used open source tools, making it an attractive alternative to security solutions requiring paid licenses.
- Expanding control systems visibility – While Malcolm is great for general-purpose network traffic analysis, its creators see a particular need in the community for tools providing insight into protocols used in industrial control systems (ICS) environments. Ongoing Malcolm development will aim to provide additional parsers for common ICS protocols.
Although all of the open source tools which make up Malcolm are already available and in general use, Malcolm provides a framework of interconnectivity which makes it greater than the sum of its parts. And while there are many other network traffic analysis solutions out there, ranging from complete Linux distributions like Security Onion to licensed products like Splunk Enterprise Security, the creators of Malcolm feel its easy deployment and robust combination of tools fill a void in the network security space that will make network traffic analysis accessible to many in both the public and private sectors as well as individual enthusiasts.
In short, Malcolm provides an easily deployable network analysis tool suite for full packet capture artifacts (PCAP files) and Zeek logs. While Internet access is required to build it, it is not required at runtime.
Quick start
Getting Malcolm
For a TL;DR
example of downloading, configuring, and running Malcolm on a Linux platform, see Installation example using Ubuntu 20.04 LTS.
The scripts to control Malcolm require Python 3. The install.py
script requires the requests module for Python 3, and will make use of the pythondialog module for user interaction (on Linux) if it is available.
Source code
The files required to build and run Malcolm are available on its GitHub page. Malcolm’s source code is released under the terms of a permissive open source software license (see see License.txt
for the terms of its release).
Building Malcolm from scratch
The build.sh
script can build Malcolm’s Docker images from scratch. See Building from source for more information.
Initial configuration
You must run auth_setup
prior to pulling Malcolm’s Docker images. You should also ensure your system configuration and docker-compose.yml
settings are tuned by running ./scripts/install.py
or ./scripts/install.py --configure
(see System configuration and tuning).
Pull Malcolm’s Docker images
Malcolm’s Docker images are periodically built and hosted on Docker Hub. If you already have Docker and Docker Compose, these prebuilt images can be pulled by navigating into the Malcolm directory (containing the docker-compose.yml
file) and running docker-compose pull
like this:
$ docker-compose pull
Pulling api ... done
Pulling arkime ... done
Pulling dashboards ... done
Pulling dashboards-helper ... done
Pulling file-monitor ... done
Pulling filebeat ... done
Pulling freq ... done
Pulling htadmin ... done
Pulling logstash ... done
Pulling name-map-ui ... done
Pulling nginx-proxy ... done
Pulling opensearch ... done
Pulling pcap-capture ... done
Pulling pcap-monitor ... done
Pulling suricata ... done
Pulling upload ... done
Pulling zeek ... done
You can then observe that the images have been retrieved by running docker images
:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
malcolmnetsec/api 6.0.0 xxxxxxxxxxxx 3 days ago 158MB
malcolmnetsec/arkime 6.0.0 xxxxxxxxxxxx 3 days ago 816MB
malcolmnetsec/dashboards 6.0.0 xxxxxxxxxxxx 3 days ago 1.02GB
malcolmnetsec/dashboards-helper 6.0.0 xxxxxxxxxxxx 3 days ago 184MB
malcolmnetsec/filebeat-oss 6.0.0 xxxxxxxxxxxx 3 days ago 624MB
malcolmnetsec/file-monitor 6.0.0 xxxxxxxxxxxx 3 days ago 588MB
malcolmnetsec/file-upload 6.0.0 xxxxxxxxxxxx 3 days ago 259MB
malcolmnetsec/freq 6.0.0 xxxxxxxxxxxx 3 days ago 132MB
malcolmnetsec/htadmin 6.0.0 xxxxxxxxxxxx 3 days ago 242MB
malcolmnetsec/logstash-oss 6.0.0 xxxxxxxxxxxx 3 days ago 1.35GB
malcolmnetsec/name-map-ui 6.0.0 xxxxxxxxxxxx 3 days ago 143MB
malcolmnetsec/nginx-proxy 6.0.0 xxxxxxxxxxxx 3 days ago 121MB
malcolmnetsec/opensearch 6.0.0 xxxxxxxxxxxx 3 days ago 1.17GB
malcolmnetsec/pcap-capture 6.0.0 xxxxxxxxxxxx 3 days ago 121MB
malcolmnetsec/pcap-monitor 6.0.0 xxxxxxxxxxxx 3 days ago 213MB
malcolmnetsec/suricata 6.0.0 xxxxxxxxxxxx 3 days ago 278MB
malcolmnetsec/zeek 6.0.0 xxxxxxxxxxxx 3 days ago 1GB
Import from pre-packaged tarballs
Once built, the malcolm_appliance_packager.sh
script can be used to create pre-packaged Malcolm tarballs for import on another machine. See Pre-Packaged Installation Files for more information.
Starting and stopping Malcolm
Use the scripts in the scripts/
directory to start and stop Malcolm, view debug logs of a currently running instance, wipe the database and restore Malcolm to a fresh state, etc.
User interface
A few minutes after starting Malcolm (probably 5 to 10 minutes for Logstash to be completely up, depending on the system), the following services will be accessible:
- Arkime: https://localhost:443
- OpenSearch Dashboards: https://localhost/dashboards/ or https://localhost:5601
- Capture File and Log Archive Upload (Web): https://localhost/upload/
- Capture File and Log Archive Upload (SFTP):
sftp://<username>@127.0.0.1:8022/files
- Host and Subnet Name Mapping Editor: https://localhost/name-map-ui/
- Account Management: https://localhost:488
Overview
Malcolm processes network traffic data in the form of packet capture (PCAP) files or Zeek logs. A sensor (packet capture appliance) monitors network traffic mirrored to it over a SPAN port on a network switch or router, or using a network TAP device. Zeek logs and Arkime sessions are generated containing important session metadata from the traffic observed, which are then securely forwarded to a Malcolm instance. Full PCAP files are optionally stored locally on the sensor device for examination later.
Malcolm parses the network session data and enriches it with additional lookups and mappings including GeoIP mapping, hardware manufacturer lookups from organizationally unique identifiers (OUI) in MAC addresses, assigning names to network segments and hosts based on user-defined IP address and MAC mappings, performing TLS fingerprinting, and many others.
The enriched data is stored in an OpenSearch document store in a format suitable for analysis through two intuitive interfaces: OpenSearch Dashboards, a flexible data visualization plugin with dozens of prebuilt dashboards providing an at-a-glance overview of network protocols; and Arkime, a powerful tool for finding and identifying the network sessions comprising suspected security incidents. These tools can be accessed through a web browser from analyst workstations or for display in a security operations center (SOC). Logs can also optionally be forwarded on to another instance of Malcolm.
For smaller networks, use at home by network security enthusiasts, or in the field for incident response engagements, Malcolm can also easily be deployed locally on an ordinary consumer workstation or laptop. Malcolm can process local artifacts such as locally-generated Zeek logs, locally-captured PCAP files, and PCAP files collected offline without the use of a dedicated sensor appliance.
Components
Malcolm leverages the following excellent open source tools, among others.
- Arkime (formerly Moloch) - for PCAP file processing, browsing, searching, analysis, and carving/exporting; Arkime itself consists of two parts:
- OpenSearch - a search and analytics engine for indexing and querying network traffic session metadata
- Logstash and Filebeat - for ingesting and parsing Zeek Log Files and ingesting them into OpenSearch in a format that Arkime understands and is able to understand in the same way it natively understands PCAP data
- OpenSearch Dashboards - for creating additional ad-hoc visualizations and dashboards beyond that which is provided by Arkime viewer
- Zeek - a network analysis framework and IDS
- Suricata - an IDS and threat detection engine
- Yara - a tool used to identify and classify malware samples
- Capa - a tool for detecting capabilities in executable files
- ClamAV - an antivirus engine for scanning files extracted by Zeek
- CyberChef - a “swiss-army knife” data conversion tool
- jQuery File Upload - for uploading PCAP files and Zeek logs for processing
- List.js - for the host and subnet name mapping interface
- Docker and Docker Compose - for simple, reproducible deployment of the Malcolm appliance across environments and to coordinate communication between its various components
- Nginx - for HTTPS and reverse proxying Malcolm components
- nginx-auth-ldap - an LDAP authentication module for nginx
- Mark Baggett’s freq - a tool for calculating entropy of strings
- Florian Roth’s Signature-Base Yara ruleset
- These Zeek plugins:
- some of Amazon.com, Inc.’s ICS protocol analyzers
- Andrew Klaus’s Sniffpass plugin for detecting cleartext passwords in HTTP POST requests
- Andrew Klaus’s zeek-httpattacks plugin for detecting noncompliant HTTP requests
- ICS protocol analyzers for Zeek published by DHS CISA and Idaho National Lab
- Corelight’s “bad neighbor” (CVE-2020-16898) plugin
- Corelight’s “OMIGOD” (CVE-2021-38647) plugin
- Corelight’s “Log4Shell” (CVE-2021-44228) plugin
- Corelight’s Microsoft Excel privilege escalation detection (CVE-2021-42292) plugin
- Corelight’s Apache HTTP server 2.4.49-2.4.50 path traversal/RCE vulnerability (CVE-2021-41773) plugin
- Corelight’s bro-xor-exe plugin
- Corelight’s callstranger-detector plugin
- Corelight’s community ID flow hashing plugin
- Corelight’s HTTP protocol stack vulnerability (CVE-2021-31166) plugin
- Corelight’s pingback plugin
- Corelight’s ripple20 plugin
- Corelight’s SIGred plugin
- Corelight’s Zerologon plugin
- Corelight’s HTTP More Filenames plugin
- J-Gras’ Zeek::AF_Packet plugin
- Johanna Amann’s CVE-2020-0601 ECC certificate validation plugin and CVE-2020-13777 GnuTLS unencrypted session ticket detection plugin
- Lexi Brent’s EternalSafety plugin
- MITRE Cyber Analytics Repository’s Bro/Zeek ATT&CK®-Based Analytics (BZAR) script
- Salesforce’s gQUIC analyzer
- Salesforce’s HASSH SSH fingerprinting plugin
- Salesforce’s JA3 TLS fingerprinting plugin
- Zeek’s Spicy plugin framework
- GeoLite2 - Malcolm includes GeoLite2 data created by MaxMind
Supported Protocols
Malcolm uses Zeek and Arkime to analyze network traffic. These tools provide varying degrees of visibility into traffic transmitted over the following network protocols:
Traffic | Wiki | Organization/Specification | Arkime | Zeek |
---|---|---|---|---|
Internet layer | 🔗 | 🔗 | ✓ | ✓ |
Border Gateway Protocol (BGP) | 🔗 | 🔗 | ✓ | |
Building Automation and Control (BACnet) | 🔗 | 🔗 | ✓ | |
Bristol Standard Asynchronous Protocol (BSAP) | 🔗 | 🔗🔗 | ✓ | |
Distributed Computing Environment / Remote Procedure Calls (DCE/RPC) | 🔗 | 🔗 | ✓ | |
Dynamic Host Configuration Protocol (DHCP) | 🔗 | 🔗 | ✓ | ✓ |
Distributed Network Protocol 3 (DNP3) | 🔗 | 🔗 | ✓✓ | |
Domain Name System (DNS) | 🔗 | 🔗 | ✓ | ✓ |
EtherCAT | 🔗 | 🔗 | ✓ | |
EtherNet/IP / Common Industrial Protocol (CIP) | 🔗 🔗 | 🔗 | ✓ | |
FTP (File Transfer Protocol) | 🔗 | 🔗 | ✓ | |
GENISYS | 🔗🔗 | ✓ | ||
Google Quick UDP Internet Connections (gQUIC) | 🔗 | 🔗 | ✓ | ✓ |
Hypertext Transfer Protocol (HTTP) | 🔗 | 🔗 | ✓ | ✓ |
IPsec | 🔗 | 🔗 | ✓ | |
Internet Relay Chat (IRC) | 🔗 | 🔗 | ✓ | ✓ |
Lightweight Directory Access Protocol (LDAP) | 🔗 | 🔗 | ✓ | ✓ |
Kerberos | 🔗 | 🔗 | ✓ | ✓ |
Modbus | 🔗 | 🔗 | ✓✓ | |
MQ Telemetry Transport (MQTT) | 🔗 | 🔗 | ✓ | |
MySQL | 🔗 | 🔗 | ✓ | ✓ |
NT Lan Manager (NTLM) | 🔗 | 🔗 | ✓ | |
Network Time Protocol (NTP) | 🔗 | 🔗 | ✓ | |
Oracle | 🔗 | 🔗 | ✓ | |
Open Platform Communications Unified Architecture (OPC UA) Binary | 🔗 | 🔗 | ✓ | |
Open Shortest Path First (OSPF) | 🔗 | 🔗🔗🔗 | ✓ | |
OpenVPN | 🔗 | 🔗🔗 | ✓ | |
PostgreSQL | 🔗 | 🔗 | ✓ | |
Process Field Net (PROFINET) | 🔗 | 🔗 | ✓ | |
Remote Authentication Dial-In User Service (RADIUS) | 🔗 | 🔗 | ✓ | ✓ |
Remote Desktop Protocol (RDP) | 🔗 | 🔗 | ✓ | |
Remote Framebuffer (RFB) | 🔗 | 🔗 | ✓ | |
S7comm / Connection Oriented Transport Protocol (COTP) | 🔗 🔗 | 🔗 🔗 | ✓ | |
Secure Shell (SSH) | 🔗 | 🔗 | ✓ | ✓ |
Secure Sockets Layer (SSL) / Transport Layer Security (TLS) | 🔗 | 🔗 | ✓ | ✓ |
Session Initiation Protocol (SIP) | 🔗 | 🔗 | ✓ | |
Server Message Block (SMB) / Common Internet File System (CIFS) | 🔗 | 🔗 | ✓ | ✓ |
Simple Mail Transfer Protocol (SMTP) | 🔗 | 🔗 | ✓ | ✓ |
Simple Network Management Protocol (SNMP) | 🔗 | 🔗 | ✓ | ✓ |
SOCKS | 🔗 | 🔗 | ✓ | ✓ |
STUN (Session Traversal Utilities for NAT) | 🔗 | 🔗 | ✓ | ✓ |
Syslog | 🔗 | 🔗 | ✓ | ✓ |
Tabular Data Stream (TDS) | 🔗 | 🔗 🔗 | ✓ | ✓ |
Telnet / remote shell (rsh) / remote login (rlogin) | 🔗🔗 | 🔗🔗 | ✓ | ✓❋ |
TFTP (Trivial File Transfer Protocol) | 🔗 | 🔗 | ✓ | |
WireGuard | 🔗 | 🔗🔗 | ✓ | |
various tunnel protocols (e.g., GTP, GRE, Teredo, AYIYA, IP-in-IP, etc.) | 🔗 | ✓ | ✓ |
Additionally, Zeek is able to detect and, where possible, log the type, vendor and version of various other software protocols.
As part of its network traffic analysis, Zeek can extract and analyze files transferred across the protocols it understands. In addition to generating logs for transferred files, deeper analysis is done into the following file types:
- Portable executable files
- X.509 certificates
See automatic file extraction and scanning for additional features related to file scanning.
See Zeek log integration for more information on how Malcolm integrates Arkime sessions and Zeek logs for analysis.
Development
Checking out the Malcolm source code results in the following subdirectories in your malcolm/
working copy:
api
- code and configuration for theapi
container which provides a REST API to query Malcolmarkime
- code and configuration for thearkime
container which processes PCAP files usingcapture
and which serves the Viewer applicationarkime-logs
- an initially empty directory to which thearkime
container will write some debug log filesarkime-raw
- an initially empty directory to which thearkime
container will write captured PCAP files; as Arkime as employed by Malcolm is currently used for processing previously-captured PCAP files, this directory is currently unusedDockerfiles
- a directory containing build instructions for Malcolm’s docker imagesdocs
- a directory containing instructions and documentationopensearch
- an initially empty directory where the OpenSearch database instance will resideopensearch-backup
- an initially empty directory for storing OpenSearch index snapshotsfilebeat
- code and configuration for thefilebeat
container which ingests Zeek logs and forwards them to thelogstash
containerfile-monitor
- code and configuration for thefile-monitor
container which can scan files extracted by Zeekfile-upload
- code and configuration for theupload
container which serves a web browser-based upload form for uploading PCAP files and Zeek logs, and which serves an SFTP share as an alternate method for uploadfreq-server
- code and configuration for thefreq
container used for calculating entropy of stringshtadmin
- configuration for thehtadmin
user account management containerdashboards
- code and configuration for thedashboards
container for creating additional ad-hoc visualizations and dashboards beyond that which is provided by Arkime Viewerlogstash
- code and configuration for thelogstash
container which parses Zeek logs and forwards them to theopensearch
containermalcolm-iso
- code and configuration for building an installer ISO for a minimal Debian-based Linux installation for running Malcolmname-map-ui
- code and configuration for thename-map-ui
container which provides the host and subnet name mapping interfacenginx
- configuration for thenginx
reverse proxy containerpcap
- an initially empty directory for PCAP files to be uploaded, processed, and storedpcap-capture
- code and configuration for thepcap-capture
container which can capture network trafficpcap-monitor
- code and configuration for thepcap-monitor
container which watches for new or uploaded PCAP files notifies the other services to process themscripts
- control scripts for starting, stopping, restarting, etc. Malcolmsensor-iso
- code and configuration for building a Hedgehog Linux ISOshared
- miscellaneous code used by various Malcolm componentssuricata
- code and configuration for thesuricata
container which handles PCAP processing using Suricatasuricata-logs
- an initially empty directory for Suricata logs to be uploaded, processed, and storedzeek
- code and configuration for thezeek
container which handles PCAP processing using Zeekzeek-logs
- an initially empty directory for Zeek logs to be uploaded, processed, and stored
and the following files of special note:
auth.env
- the script./scripts/auth_setup
prompts the user for the administrator credentials used by the Malcolm appliance, andauth.env
is the environment file where those values are storedcidr-map.txt
- specify custom IP address to network segment mappinghost-map.txt
- specify custom IP and/or MAC address to host mappingnet-map.json
- an alternative tocidr-map.txt
andhost-map.txt
, mapping hosts and network segments to their names in a JSON-formatted filedocker-compose.yml
- the configuration file used bydocker-compose
to build, start, and stop an instance of the Malcolm appliancedocker-compose-standalone.yml
- similar todocker-compose.yml
, only used for the “packaged” installation of Malcolm
Building from source
Building the Malcolm docker images from scratch requires internet access to pull source files for its components. Once internet access is available, execute the following command to build all of the Docker images used by the Malcolm appliance:
$ ./scripts/build.sh
Then, go take a walk or something since it will be a while. When you’re done, you can run docker images
and see you have fresh images for:
malcolmnetsec/api
(based onpython:3-slim
)malcolmnetsec/arkime
(based ondebian:11-slim
)malcolmnetsec/dashboards-helper
(based onalpine:3.15
)malcolmnetsec/dashboards
(based onopensearchproject/opensearch-dashboards
)malcolmnetsec/file-monitor
(based ondebian:11-slim
)malcolmnetsec/file-upload
(based ondebian:11-slim
)malcolmnetsec/filebeat-oss
(based ondocker.elastic.co/beats/filebeat-oss
)malcolmnetsec/freq
(based ondebian:11-slim
)malcolmnetsec/htadmin
(based ondebian:11-slim
)malcolmnetsec/logstash-oss
(based onopensearchproject/logstash-oss-with-opensearch-output-plugin
)malcolmnetsec/name-map-ui
(based onalpine:3.15
)malcolmnetsec/nginx-proxy
(based onalpine:3.15
)malcolmnetsec/opensearch
(based onopensearchproject/opensearch
)malcolmnetsec/pcap-capture
(based ondebian:11-slim
)malcolmnetsec/pcap-monitor
(based ondebian:11-slim
)malcolmnetsec/suricata
(based ondebian:11-slim
)malcolmnetsec/zeek
(based ondebian:11-slim
)
Alternately, if you have forked Malcolm on GitHub, workflow files are provided which contain instructions for GitHub to build the docker images and sensor and Malcolm installer ISOs. The resulting images are named according to the pattern ghcr.io/owner/malcolmnetsec/image:branch
(e.g., if you’ve forked Malcolm with the github user romeogdetlevjr
, the arkime
container built for the main
would be named ghcr.io/romeogdetlevjr/malcolmnetsec/arkime:main
). To run your local instance of Malcolm using these images instead of the official ones, you’ll need to edit your docker-compose.yml
file(s) and replace the image:
tags according to this new pattern, or use the bash helper script ./shared/bin/github_image_helper.sh
to pull and re-tag the images.
Pre-Packaged installation files
Creating pre-packaged installation files
scripts/malcolm_appliance_packager.sh
can be run to package up the configuration files (and, if necessary, the Docker images) which can be copied to a network share or USB drive for distribution to non-networked machines. For example:
$ ./scripts/malcolm_appliance_packager.sh
You must set a username and password for Malcolm, and self-signed X.509 certificates will be generated
Store administrator username/password for local Malcolm access? (Y/n):
Administrator username: analyst
analyst password:
analyst password (again):
(Re)generate self-signed certificates for HTTPS access (Y/n):
(Re)generate self-signed certificates for a remote log forwarder (Y/n):
Store username/password for forwarding Logstash events to a secondary, external OpenSearch instance (y/N):
Store username/password for email alert sender account (y/N):
Packaged Malcolm to "/home/user/tmp/malcolm_20190513_101117_f0d052c.tar.gz"
Do you need to package docker images also [y/N]? y
This might take a few minutes...
Packaged Malcolm docker images to "/home/user/tmp/malcolm_20190513_101117_f0d052c_images.tar.gz"
To install Malcolm:
1. Run install.py
2. Follow the prompts
To start, stop, restart, etc. Malcolm:
Use the control scripts in the "scripts/" directory:
- start (start Malcolm)
- stop (stop Malcolm)
- restart (restart Malcolm)
- logs (monitor Malcolm logs)
- wipe (stop Malcolm and clear its database)
- auth_setup (change authentication-related settings)
A minute or so after starting Malcolm, the following services will be accessible:
- Arkime: https://localhost/
- OpenSearch Dashboards: https://localhost/dashboards/
- PCAP upload (web): https://localhost/upload/
- PCAP upload (sftp): sftp://USERNAME@127.0.0.1:8022/files/
- Host and subnet name mapping editor: https://localhost/name-map-ui/
- Account management: https://localhost:488/
The above example will result in the following artifacts for distribution as explained in the script’s output:
$ ls -lh
total 2.0G
-rwxr-xr-x 1 user user 61k May 13 11:32 install.py
-rw-r--r-- 1 user user 2.0G May 13 11:37 malcolm_20190513_101117_f0d052c_images.tar.gz
-rw-r--r-- 1 user user 683 May 13 11:37 malcolm_20190513_101117_f0d052c.README.txt
-rw-r--r-- 1 user user 183k May 13 11:32 malcolm_20190513_101117_f0d052c.tar.gz
Installing from pre-packaged installation files
If you have obtained pre-packaged installation files to install Malcolm on a non-networked machine via an internal network share or on a USB key, you likely have the following files:
malcolm_YYYYMMDD_HHNNSS_xxxxxxx.README.txt
- This readme file contains a minimal set up instructions for extracting the contents of the other tarballs and running the Malcolm appliance.malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz
- This tarball contains the configuration files and directory configuration used by an instance of Malcolm. It can be extracted viatar -xf malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz
upon which a directory will be created (named similarly to the tarball) containing the directories and configuration files. Alternatively,install.py
can accept this filename as an argument and handle its extraction and initial configuration for you.malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz
- This tarball contains the Docker images used by Malcolm. It can be imported manually viadocker load -i malcolm_YYYYMMDD_HHNNSS_xxxxxxx_images.tar.gz
install.py
- This install script can load the Docker images and extract Malcolm configuration files from the aforementioned tarballs and do some initial configuration for you.
Run install.py malcolm_XXXXXXXX_XXXXXX_XXXXXXX.tar.gz
and follow the prompts. If you do not already have Docker and Docker Compose installed, the install.py
script will help you install them.
Preparing your system
Recommended system requirements
Malcolm runs on top of Docker which runs on recent releases of Linux, Apple macOS and Microsoft Windows 10.
To quote the Elasticsearch documentation, “If there is one resource that you will run out of first, it will likely be memory.” The same is true for Malcolm: you will want at least 16 gigabytes of RAM to run Malcolm comfortably. For processing large volumes of traffic, I’d recommend at a bare minimum a dedicated server with 16 cores and 16 gigabytes of RAM. Malcolm can run on less, but more is better. You’re going to want as much hard drive space as possible, of course, as the amount of PCAP data you’re able to analyze and store will be limited by your hard drive.
Arkime’s wiki has a couple of documents (here and here and here and a calculator here) which may be helpful, although not everything in those documents will apply to a Docker-based setup like Malcolm.
System configuration and tuning
If you already have Docker and Docker Compose installed, the install.py
script can still help you tune system configuration and docker-compose.yml
parameters for Malcolm. To run it in “configuration only” mode, bypassing the steps to install Docker and Docker Compose, run it like this:
./scripts/install.py --configure
Although install.py
will attempt to automate many of the following configuration and tuning parameters, they are nonetheless listed in the following sections for reference:
docker-compose.yml
parameters
Edit docker-compose.yml
and search for the OPENSEARCH_JAVA_OPTS
key. Edit the -Xms4g -Xmx4g
values, replacing 4g
with a number that is half of your total system memory, or just under 32 gigabytes, whichever is less. So, for example, if I had 64 gigabytes of memory I would edit those values to be -Xms31g -Xmx31g
. This indicates how much memory can be allocated to the OpenSearch heaps. For a pleasant experience, I would suggest not using a value under 10 gigabytes. Similar values can be modified for Logstash with LS_JAVA_OPTS
, where using 3 or 4 gigabytes is recommended.
Various other environment variables inside of docker-compose.yml
can be tweaked to control aspects of how Malcolm behaves, particularly with regards to processing PCAP files and Zeek logs. The environment variables of particular interest are located near the top of that file under Commonly tweaked configuration options, which include:
ARKIME_ANALYZE_PCAP_THREADS
– the number of threads available to Arkime for analyzing PCAP files (default1
)AUTO_TAG
– if set totrue
, Malcolm will automatically create Arkime sessions and Zeek logs with tags based on the filename, as described in Tagging (defaulttrue
)BEATS_SSL
– if set totrue
, Logstash will use require encrypted communications for any external Beats-based forwarders from which it will accept logs; if Malcolm is being used as a standalone tool then this can safely be set tofalse
, but if external log feeds are to be accepted then setting it to true is recommended (defaultfalse
)CONNECTION_SECONDS_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the duration threshold (in seconds) for assigning severity to long connections (default3600
)EXTRACTED_FILE_CAPA_VERBOSE
– if set totrue
, all Capa rule hits will be logged; otherwise (false
) only MITRE ATT&CK® technique classifications will be loggedEXTRACTED_FILE_ENABLE_CAPA
– if set totrue
, Zeek-extracted files that are determined to be PE (portable executable) files will be scanned with CapaEXTRACTED_FILE_ENABLE_CLAMAV
– if set totrue
, Zeek-extracted files will be scanned with ClamAVEXTRACTED_FILE_ENABLE_YARA
– if set totrue
, Zeek-extracted files will be scanned with YaraEXTRACTED_FILE_HTTP_SERVER_ENABLE
– if set totrue
, the directory containing Zeek-extracted files will be served over HTTP at./extracted-files/
(e.g., https://localhost/extracted-files/ if you are connecting locally)EXTRACTED_FILE_HTTP_SERVER_ENCRYPT
– if set totrue
, those Zeek-extracted files will be AES-256-CBC-encrypted in anopenssl enc
-compatible format (e.g.,openssl enc -aes-256-cbc -d -in example.exe.encrypted -out example.exe
)EXTRACTED_FILE_HTTP_SERVER_KEY
– specifies the AES-256-CBC decryption password for encrypted Zeek-extracted files; used in conjunction withEXTRACTED_FILE_HTTP_SERVER_ENCRYPT
EXTRACTED_FILE_IGNORE_EXISTING
– if set totrue
, files extant in./zeek-logs/extract_files/
directory will be ignored on startup rather than scannedEXTRACTED_FILE_PRESERVATION
– determines behavior for preservation of Zeek-extracted filesEXTRACTED_FILE_UPDATE_RULES
– if set totrue
, file scanner engines (e.g., ClamAV, Capa, Yara) will periodically update their rule definitionsEXTRACTED_FILE_YARA_CUSTOM_ONLY
– if set totrue
, Malcolm will bypass the default Yara ruleset and use only user-defined rules in./yara/rules
FREQ_LOOKUP
- if set totrue
, domain names (from DNS queries and SSL server names) will be assigned entropy scores as calculated byfreq
(defaultfalse
)FREQ_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the entropy threshold for assigning severity to events with entropy scores calculated byfreq
; a lower value will only assign severity scores to fewer domain names with higher entropy (e.g.,2.0
forNQZHTFHRMYMTVBQJE.COM
), while a higher value will assign severity scores to more domain names with lower entropy (e.g.,7.5
fornaturallanguagedomain.example.org
) (default2.0
)LOGSTASH_OUI_LOOKUP
– if set totrue
, Logstash will map MAC addresses to vendors for all source and destination MAC addresses when analyzing Zeek logs (defaulttrue
)LOGSTASH_REVERSE_DNS
– if set totrue
, Logstash will perform a reverse DNS lookup for all external source and destination IP address values when analyzing Zeek logs (defaultfalse
)LOGSTASH_SEVERITY_SCORING
- if set totrue
, Logstash will perform severity scoring when analyzing Zeek logs (defaulttrue
)MANAGE_PCAP_FILES
– if set totrue
, all PCAP files imported into Malcolm will be marked as available for deletion by Arkime if available storage space becomes too low (defaultfalse
)MAXMIND_GEOIP_DB_LICENSE_KEY
- Malcolm uses MaxMind’s free GeoLite2 databases for GeoIP lookups. As of December 30, 2019, these databases are no longer available for download via a public URL. Instead, they must be downloaded using a MaxMind license key (available without charge from MaxMind). The license key can be specified here for GeoIP database downloads during build- and run-time.NGINX_BASIC_AUTH
- if set totrue
, use TLS-encrypted HTTP basic authentication (default); if set tofalse
, use Lightweight Directory Access Protocol (LDAP) authenticationNGINX_LOG_ACCESS_AND_ERRORS
- if set totrue
, all access to Malcolm via its web interfaces will be logged to OpenSearch (defaultfalse
)NGINX_SSL
- if set totrue
, require HTTPS connections to Malcolm’snginx-proxy
container (default); if set tofalse
, use unencrypted HTTP connections (using unsecured HTTP connections is NOT recommended unless you are running Malcolm behind another reverse proxy like Traefik, Caddy, etc.)OS_EXTERNAL_HOSTS
– if specified (in the format'10.0.0.123:9200'
), logs received by Logstash will be forwarded on to another external OpenSearch instance in addition to the one maintained locally by MalcolmOS_EXTERNAL_SSL_CERTIFICATE_VERIFICATION
– if set totrue
, Logstash will require full TLS certificate validation; this may fail if using self-signed certificates (defaultfalse
)OS_EXTERNAL_SSL
– if set totrue
, Logstash will use HTTPS for the connection to external OpenSearch instances specified inOS_EXTERNAL_HOSTS
PCAP_ENABLE_NETSNIFF
– if set totrue
, Malcolm will capture network traffic on the local network interface(s) indicated inPCAP_IFACE
using netsniff-ngPCAP_ENABLE_TCPDUMP
– if set totrue
, Malcolm will capture network traffic on the local network interface(s) indicated inPCAP_IFACE
using tcpdump; there is no reason to enable bothPCAP_ENABLE_NETSNIFF
andPCAP_ENABLE_TCPDUMP
PCAP_FILTER
– specifies a tcpdump-style filter expression for local packet capture; leave blank to capture all trafficPCAP_IFACE
– used to specify the network interface(s) for local packet capture ifPCAP_ENABLE_NETSNIFF
orPCAP_ENABLE_TCPDUMP
are enabled; for multiple interfaces, separate the interface names with a comma (e.g.,'enp0s25'
or'enp10s0,enp11s0'
)PCAP_ROTATE_MEGABYTES
– used to specify how large a locally-captured PCAP file can become (in megabytes) before it closed for processing and a new PCAP file createdPCAP_ROTATE_MINUTES
– used to specify an time interval (in minutes) after which a locally-captured PCAP file will be closed for processing and a new PCAP file createdpipeline.workers
,pipeline.batch.size
andpipeline.batch.delay
- these settings are used to tune the performance and resource utilization of the thelogstash
container; see Tuning and Profiling Logstash Performance,logstash.yml
and Multiple PipelinesPUID
andPGID
- Docker runs all of its containers as the privilegedroot
user by default. For better security, Malcolm immediately drops to non-privileged user accounts for executing internal processes wherever possible. ThePUID
(process user ID) andPGID
(process group ID) environment variables allow Malcolm to map internal non-privileged user accounts to a corresponding user account on the host.QUESTIONABLE_COUNTRY_CODES
- when severity scoring is enabled, this variable defines a comma-separated list of countries of concern (using ISO 3166-1 alpha-2 codes) (default'CN,IR,KP,RU,UA'
)SURICATA_AUTO_ANALYZE_PCAP_FILES
– if set totrue
, all PCAP files imported into Malcolm will automatically be analyzed by Suricata, and the resulting logs will also be imported (defaultfalse
)SURICATA_AUTO_ANALYZE_PCAP_THREADS
– the number of threads available to Malcolm for analyzing Suricata logs (default1
)SURICATA_CUSTOM_RULES_ONLY
– if set totrue
, Malcolm will bypass the default Suricata ruleset and use only user-defined rules (./suricata/rules/*.rules
).SURICATA_…
- thesuricata
container entrypoint script can use many more environment variables to tweak suricata.yaml; in that script,DEFAULT_VARS
defines those variables (albeit without theSURICATA_
prefix you must add to each for use)TOTAL_MEGABYTES_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the size threshold (in megabytes) for assigning severity to large connections or file transfers (default1000
)VTOT_API2_KEY
– used to specify a VirusTotal Public API v.20 key, which, if specified, will be used to submit hashes of Zeek-extracted files to VirusTotalZEEK_AUTO_ANALYZE_PCAP_FILES
– if set totrue
, all PCAP files imported into Malcolm will automatically be analyzed by Zeek, and the resulting logs will also be imported (defaultfalse
)ZEEK_AUTO_ANALYZE_PCAP_THREADS
– the number of threads available to Malcolm for analyzing Zeek logs (default1
)ZEEK_DISABLE_...
- if set to any non-blank value, each of these variables can be used to disable a certain Zeek function when it analyzes PCAP files (for example, settingZEEK_DISABLE_LOG_PASSWORDS
totrue
to disable logging of cleartext passwords)ZEEK_DISABLE_BEST_GUESS_ICS
- see “Best Guess” Fingerprinting for ICS ProtocolsZEEK_EXTRACTOR_MODE
– determines the file extraction behavior for file transfers detected by Zeek; see Automatic file extraction and scanning for more detailsZEEK_INTEL_FEED_SINCE
- when querying a TAXII or MISP feed, only process threat indicators that have been created or modified since the time represented by this value; it may be either a fixed date/time (01/01/2021
) or relative interval (30 days ago
)ZEEK_INTEL_ITEM_EXPIRATION
- specifies the value for Zeek’sIntel::item_expiration
timeout as used by the Zeek Intelligence Framework (default-1min
, which disables item expiration)ZEEK_INTEL_REFRESH_CRON_EXPRESSION
- specifies a cron expression indicating the refresh interval for generating the Zeek Intelligence Framework files (defaults to empty, which disables automatic refresh)
Linux host system configuration
Installing Docker
Docker installation instructions vary slightly by distribution. Please follow the links below to docker.com to find the instructions specific to your distribution:
After installing Docker, because Malcolm should be run as a non-root user, add your user to the docker
group with something like:
$ sudo usermod -aG docker yourusername
Following this, either reboot or log out then log back in.
Docker starts automatically on DEB-based distributions. On RPM-based distributions, you need to start it manually or enable it using the appropriate systemctl
or service
command(s).
You can test docker by running docker info
, or (assuming you have internet access), docker run --rm hello-world
.
Installing docker-compose
Please follow this link on docker.com for instructions on installing docker-compose.
Operating system configuration
The host system (ie., the one running Docker) will need to be configured for the best possible OpenSearch performance. Here are a few suggestions for Linux hosts (these may vary from distribution to distribution):
- Append the following lines to
/etc/sysctl.conf
:
# the maximum number of open file handles
fs.file-max=2097152
# increase maximums for inotify watches
fs.inotify.max_user_watches=131072
fs.inotify.max_queued_events=131072
fs.inotify.max_user_instances=512
# the maximum number of memory map areas a process may have
vm.max_map_count=262144
# decrease "swappiness" (swapping out runtime memory vs. dropping pages)
vm.swappiness=1
# the maximum number of incoming connections
net.core.somaxconn=65535
# the % of system memory fillable with "dirty" pages before flushing
vm.dirty_background_ratio=40
# maximum % of dirty system memory before committing everything
vm.dirty_ratio=80
- Depending on your distribution, create either the file
/etc/security/limits.d/limits.conf
containing:
# the maximum number of open file handles
* soft nofile 65535
* hard nofile 65535
# do not limit the size of memory that can be locked
* soft memlock unlimited
* hard memlock unlimited
OR the file /etc/systemd/system.conf.d/limits.conf
containing:
[Manager]
# the maximum number of open file handles
DefaultLimitNOFILE=65535:65535
# do not limit the size of memory that can be locked
DefaultLimitMEMLOCK=infinity
- Change the readahead value for the disk where the OpenSearch data will be stored. There are a few ways to do this. For example, you could add this line to
/etc/rc.local
(replacing/dev/sda
with your disk block descriptor):
# change disk read-adhead value (# of blocks)
blockdev --setra 512 /dev/sda
Change the I/O scheduler to
deadline
ornoop
. Again, this can be done in a variety of ways. The simplest is to addelevator=deadline
to the arguments inGRUB_CMDLINE_LINUX
in/etc/default/grub
, then runningsudo update-grub2
If you are planning on using very large data sets, consider formatting the drive containing
opensearch
volume as XFS.
After making all of these changes, do a reboot for good measure!
macOS host system configuration
Automatic installation using install.py
The install.py
script will attempt to guide you through the installation of Docker and Docker Compose if they are not present. If that works for you, you can skip ahead to Configure docker daemon option in this section.
Install Homebrew
The easiest way to install and maintain docker on Mac is using the Homebrew cask. Execute the following in a terminal.
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
$ brew install cask
$ brew tap homebrew/cask-versions
Install docker-edge
$ brew cask install docker-edge
This will install the latest version of docker and docker-compose. It can be upgraded later using brew
as well:
$ brew cask upgrade --no-quarantine docker-edge
You can now run docker from the Applications folder.
Configure docker daemon option
Some changes should be made for performance (this link gives a good succinct overview).
Resource allocation - For a good experience, you likely need at least a quad-core MacBook Pro with 16GB RAM and an SSD. I have run Malcolm on an older 2013 MacBook Pro with 8GB of RAM, but the more the better. Go in your system tray and select Docker → Preferences → Advanced. Set the resources available to docker to at least 4 CPUs and 8GB of RAM (>= 16GB is preferable).
Volume mount performance - You can speed up performance of volume mounts by removing unused paths from Docker → Preferences → File Sharing. For example, if you’re only going to be mounting volumes under your home directory, you could share
/Users
but remove other paths.
After making these changes, right click on the Docker 🐋 icon in the system tray and select Restart.
Windows host system configuration
Installing and configuring Docker Desktop for Windows
Installing and configuring Docker to run under Windows must be done manually, rather than through the install.py
script as is done for Linux and macOS.
- Be running Windows 10, version 1903 or higher
- Prepare your system and install WSL and a Linux distribution by running
wsl --install -d Debian
in PowerShell as Administrator (these instructions are tested with Debian, but may work with other distributions) - Install Docker Desktop for Windows either by downloading the installer from the official Docker site or installing it through chocolatey.
- Follow the Docker Desktop WSL 2 backend instructions to finish configuration and review best practices
- Reboot
- Open the WSL distribution’s terminal and run run
docker info
to make sure Docker is running
Finish Malcolm’s configuration
Once Docker is installed, configured and running as described in the previous section, run ./scripts/install.py --configure
to finish configuration of the local Malcolm installation. Malcolm will be controlled and run from within your WSL distribution’s terminal environment.
Running Malcolm
Configure authentication
Malcolm requires authentication to access the user interface. Nginx can authenticate users with either local TLS-encrypted HTTP basic authentication or using a remote Lightweight Directory Access Protocol (LDAP) authentication server.
With the local basic authentication method, user accounts are managed by Malcolm and can be created, modified, and deleted using a user management web interface. This method is suitable in instances where accounts and credentials do not need to be synced across many Malcolm installations.
LDAP authentication are managed on a remote directory service, such as a Microsoft Active Directory Domain Services or OpenLDAP.
Malcolm’s authentication method is defined in the x-auth-variables
section near the top of the docker-compose.yml
file with the NGINX_BASIC_AUTH
environment variable: true
for local TLS-encrypted HTTP basic authentication, false
for LDAP authentication.
In either case, you must run ./scripts/auth_setup
before starting Malcolm for the first time in order to:
- define the local Malcolm administrator account username and password (although these credentials will only be used for basic authentication, not LDAP authentication)
- specify whether or not to (re)generate the self-signed certificates used for HTTPS access
- key and certificate files are located in the
nginx/certs/
directory
- key and certificate files are located in the
- specify whether or not to (re)generate the self-signed certificates used by a remote log forwarder (see the
BEATS_SSL
environment variable above)- certificate authority, certificate, and key files for Malcolm’s Logstash instance are located in the
logstash/certs/
directory - certificate authority, certificate, and key files to be copied to and used by the remote log forwarder are located in the
filebeat/certs/
directory
- certificate authority, certificate, and key files for Malcolm’s Logstash instance are located in the
- specify whether or not to store the username/password for forwarding Logstash events to a secondary, external OpenSearch instance (see the
OS_EXTERNAL_HOSTS
,OS_EXTERNAL_SSL
, andOS_EXTERNAL_SSL_CERTIFICATE_VERIFICATION
environment variables above)- these parameters are stored securely in the Logstash keystore file
logstash/certs/logstash.keystore
- these parameters are stored securely in the Logstash keystore file
- specify whether or not to store the username/password for email alert senders
- these parameters are stored securely in the OpenSearch keystore file
opensearch/opensearch.keystore
- these parameters are stored securely in the OpenSearch keystore file
Local account management
auth_setup
is used to define the username and password for the administrator account. Once Malcolm is running, the administrator account can be used to manage other user accounts via a Malcolm User Management page served over HTTPS on port 488 (e.g., https://localhost:488 if you are connecting locally).
Malcolm user accounts can be used to access the interfaces of all of its components, including Arkime. Arkime uses its own internal database of user accounts, so when a Malcolm user account logs in to Arkime for the first time Malcolm creates a corresponding Arkime user account automatically. This being the case, it is not recommended to use the Arkime Users settings page or change the password via the Password form under the Arkime Settings page, as those settings would not be consistently used across Malcolm.
Users may change their passwords via the Malcolm User Management page by clicking User Self Service. A forgotten password can also be reset via an emailed link, though this requires SMTP server settings to be specified in htadmin/config.ini
in the Malcolm installation directory.
Lightweight Directory Access Protocol (LDAP) authentication
The nginx-auth-ldap module serves as the interface between Malcolm’s Nginx web server and a remote LDAP server. When you run auth_setup
for the first time, a sample LDAP configuration file is created at nginx/nginx_ldap.conf
.
# This is a sample configuration for the ldap_server section of nginx.conf.
# Yours will vary depending on how your Active Directory/LDAP server is configured.
# See https://github.com/kvspb/nginx-auth-ldap#available-config-parameters for options.
ldap_server ad_server {
url "ldap://ds.example.com:3268/DC=ds,DC=example,DC=com?sAMAccountName?sub?(objectClass=person)";
binddn "bind_dn";
binddn_passwd "bind_dn_password";
group_attribute member;
group_attribute_is_dn on;
require group "CN=Malcolm,CN=Users,DC=ds,DC=example,DC=com";
require valid_user;
satisfy all;
}
auth_ldap_cache_enabled on;
auth_ldap_cache_expiration_time 10000;
auth_ldap_cache_size 1000;
This file is mounted into the nginx
container when Malcolm is started to provide connection information for the LDAP server.
The contents of nginx_ldap.conf
will vary depending on how the LDAP server is configured. Some of the avaiable parameters in that file include:
url
- theldap://
orldaps://
connection URL for the remote LDAP server, which has the following syntax:ldap[s]://<hostname>:<port>/<base_dn>?<attributes>?<scope>?<filter>
binddn
andbinddn_password
- the account credentials used to query the LDAP directorygroup_attribute
- the group attribute name which contains the member object (e.g.,member
ormemberUid
)group_attribute_is_dn
- whether or not to search for the user’s full distinguished name as the value in the group’s member attributerequire
andsatisfy
-require user
,require group
andrequire valid_user
can be used in conjunction withsatisfy any
orsatisfy all
to limit the users that are allowed to access the Malcolm instance
Before starting Malcolm, edit nginx/nginx_ldap.conf
according to the specifics of your LDAP server and directory tree structure. Using a LDAP search tool such as ldapsearch
in Linux or dsquery
in Windows may be of help as you formulate the configuration. Your changes should be made within the curly braces of the ldap_server ad_server { … }
section. You can troubleshoot configuration file syntax errors and LDAP connection or credentials issues by running ./scripts/logs
(or docker-compose logs nginx
) and examining the output of the nginx
container.
The Malcolm User Management page described above is not available when using LDAP authentication.
LDAP connection security
Authentication over LDAP can be done using one of three ways, two of which offer data confidentiality protection:
- StartTLS - the standard extension to the LDAP protocol to establish an encrypted SSL/TLS connection within an already established LDAP connection
- LDAPS - a commonly used (though unofficial and considered deprecated) method in which SSL negotiation takes place before any commands are sent from the client to the server
- Unencrypted (cleartext) (not recommended)
In addition to the NGINX_BASIC_AUTH
environment variable being set to false
in the x-auth-variables
section near the top of the docker-compose.yml
file, the NGINX_LDAP_TLS_STUNNEL
and NGINX_LDAP_TLS_STUNNEL
environment variables are used in conjunction with the values in nginx/nginx_ldap.conf
to define the LDAP connection security level. Use the following combinations of values to achieve the connection security methods above, respectively:
- StartTLS
NGINX_LDAP_TLS_STUNNEL
set totrue
indocker-compose.yml
url
should begin withldap://
and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) innginx/nginx_ldap.conf
- LDAPS
NGINX_LDAP_TLS_STUNNEL
set tofalse
indocker-compose.yml
url
should begin withldaps://
and its port should be either the default LDAPS port (636) or the default LDAPS Global Catalog port (3269) innginx/nginx_ldap.conf
- Unencrypted (clear text) (not recommended)
NGINX_LDAP_TLS_STUNNEL
set tofalse
indocker-compose.yml
url
should begin withldap://
and its port should be either the default LDAP port (389) or the default Global Catalog port (3268) innginx/nginx_ldap.conf
For encrypted connections (whether using StartTLS or LDAPS), Malcolm will require and verify certificates when one or more trusted CA certificate files are placed in the nginx/ca-trust/
directory. Otherwise, any certificate presented by the domain server will be accepted.
TLS certificates
When you set up authentication for Malcolm a set of unique self-signed TLS certificates are created which are used to secure the connection between clients (e.g., your web browser) and Malcolm’s browser-based interface. This is adequate for most Malcolm instances as they are often run locally or on internal networks, although your browser will most likely require you to add a security exception for the certificate the first time you connect to Malcolm.
Another option is to generate your own certificates (or have them issued to you) and have them placed in the nginx/certs/
directory. The certificate and key file should be named cert.pem
and key.pem
, respectively.
A third possibility is to use a third-party reverse proxy (e.g., Traefik or Caddy) to handle the issuance of the certificates for you and to broker the connections between clients and Malcolm. Reverse proxies such as these often implement the ACME protocol for domain name authentication and can be used to request certificates from certificate authorities like Let’s Encrypt. In this configuration, the reverse proxy will be encrypting the connections instead of Malcolm, so you’ll need to set the NGINX_SSL
environment variable to false
in docker-compose.yml
(or answer no
to the “Require encrypted HTTPS connections?” question posed by install.py
). If you are setting NGINX_SSL
to false
, make sure you understand what you are doing and ensure that external connections cannot reach ports over which Malcolm will be communicating without encryption, including verifying your local firewall configuration.
Starting Malcolm
Docker compose is used to coordinate running the Docker containers. To start Malcolm, navigate to the directory containing docker-compose.yml
and run:
$ ./scripts/start
This will create the containers’ virtual network and instantiate them, then leave them running in the background. The Malcolm containers may take a several minutes to start up completely. To follow the debug output for an already-running Malcolm instance, run:
$ ./scripts/logs
You can also use docker stats
to monitor the resource utilization of running containers.
Stopping and restarting Malcolm
You can run ./scripts/stop
to stop the docker containers and remove their virtual network. Alternatively, ./scripts/restart
will restart an instance of Malcolm. Because the data on disk is stored on the host in docker volumes, doing these operations will not result in loss of data.
Malcolm can be configured to be automatically restarted when the Docker system daemon restart (for example, on system reboot). This behavior depends on the value of the restart:
setting for each service in the docker-compose.yml
file. This value can be set by running ./scripts/install.py --configure
and answering “yes” to “Restart Malcolm upon system or Docker daemon restart?
.”
Clearing Malcolm’s data
Run ./scripts/wipe
to stop the Malcolm instance and wipe its OpenSearch database (including index snapshots and management policies and alerting configuration).
Temporary read-only interface
To temporarily set the Malcolm user interaces into a read-only configuration, run the following commands from the Malcolm installation directory.
First, to configure [Nginx] to disable access to the upload and other interfaces for changing Malcolm settings, and to deny HTTP methods other than GET
and POST
:
docker-compose exec nginx-proxy bash -c "cp /etc/nginx/nginx_readonly.conf /etc/nginx/nginx.conf && nginx -s reload"
Second, to set the existing OpenSearch data store to read-only:
docker-compose exec dashboards-helper /data/opensearch_read_only.py -i _cluster
These commands must be re-run every time you restart Malcolm.
Note that after you run these commands you may see an increase of error messages in the Malcolm containers’ output as various background processes will fail due to the read-only nature of the indices. Additionally, some features such as Arkime’s Hunt and building your own visualizations and dashboards in OpenSearch Dashboards will not function correctly in read-only mode.
Capture file and log archive upload
Malcolm serves a web browser-based upload form for uploading PCAP files and Zeek logs at https://localhost/upload/ if you are connecting locally.
Additionally, there is a writable files
directory on an SFTP server served on port 8022 (e.g., sftp://USERNAME@localhost:8022/files/
if you are connecting locally).
The types of files supported are:
- PCAP files (of mime type
application/vnd.tcpdump.pcap
orapplication/x-pcapng
)- PCAPNG files are partially supported: Zeek is able to process PCAPNG files, but not all of Arkime’s packet examination features work correctly
- Zeek logs in archive files (
application/gzip
,application/x-gzip
,application/x-7z-compressed
,application/x-bzip2
,application/x-cpio
,application/x-lzip
,application/x-lzma
,application/x-rar-compressed
,application/x-tar
,application/x-xz
, orapplication/zip
)- where the Zeek logs are found in the internal directory structure in the archive file does not matter
Files uploaded via these methods are monitored and moved automatically to other directories for processing to begin, generally within one minute of completion of the upload.
Tagging
In addition to be processed for uploading, Malcolm events will be tagged according to the components of the filenames of the PCAP files or Zeek log archives files from which the events were parsed. For example, records created from a PCAP file named ACME_Scada_VLAN10.pcap
would be tagged with ACME
, Scada
, and VLAN10
. Tags are extracted from filenames by splitting on the characters “,” (comma), “-“ (dash), and “_” (underscore). These tags are viewable and searchable (via the tags
field) in Arkime and OpenSearch Dashboards. This behavior can be changed by modifying the AUTO_TAG
environment variable in docker-compose.yml
.
Tags may also be specified manually with the browser-based upload form.
Processing uploaded PCAPs with Zeek and Suricata
The Analyze with Zeek and Analyze with Suricata checkboxes may be used when uploading PCAP files to cause them to be analyzed by Zeek and Suricata, respectively. This is functionally equivalent to the ZEEK_AUTO_ANALYZE_PCAP_FILES
and SURICATA_AUTO_ANALYZE_PCAP_FILES
environment variables described above, only on a per-upload basis. Zeek can also automatically carve out files from file transfers; see Automatic file extraction and scanning for more details.
Live analysis
Capturing traffic on local network interfaces
Malcolm’s pcap-capture
container can capture traffic on one or more local network interfaces and periodically rotate these files for processing. The pcap-capture
Docker container is started with additional privileges (IPC_LOCK
, NET_ADMIN
, NET_RAW
, and SYS_ADMIN
) in order for it to be able to open network interfaces in promiscuous mode for capture.
The environment variables prefixed with PCAP_
in the docker-compose.yml
file determine local packet capture behavior. Local capture can also be configured by running ./scripts/install.py --configure
and answering “yes” to “Should Malcolm capture network traffic to PCAP files?
.”
Note that currently Microsoft Windows and Apple macOS platforms run Docker inside of a virtualized environment. This would require additional configuration of virtual interfaces and port forwarding in Docker, the process for which is outside of the scope of this document.
Using a network sensor appliance
A remote network sensor appliance can be used to monitor network traffic, capture PCAP files, and forward Zeek logs, Arkime sessions, or other information to Malcolm. Hedgehog Linux is a Debian-based operating system built to
- monitor network interfaces
- capture packets to PCAP files
- detect file transfers in network traffic and extract and scan those files for threats
- generate and forward Zeek logs, Arkime sessions, and other information to Malcolm
Please see the Hedgehog Linux README for more information.
Manually forwarding logs from an external source
Malcolm’s Logstash instance can also be configured to accept logs from a remote forwarder by running ./scripts/install.py --configure
and answering “yes” to “Expose Logstash port to external hosts?
.” Enabling encrypted transport of these logs files is discussed in Configure authentication and the description of the BEATS_SSL
environment variable in the docker-compose.yml
file.
Configuring Filebeat to forward Zeek logs to Malcolm might look something like this example filebeat.yml
:
filebeat.inputs:
- type: log
paths:
- /var/zeek/*.log
fields_under_root: true
fields:
type: "session"
compression_level: 0
exclude_lines: ['^\s*#']
scan_frequency: 10s
clean_inactive: 180m
ignore_older: 120m
close_inactive: 90m
close_renamed: true
close_removed: true
close_eof: false
clean_renamed: true
clean_removed: true
output.logstash:
hosts: ["192.0.2.123:5044"]
ssl.enabled: true
ssl.certificate_authorities: ["/foo/bar/ca.crt"]
ssl.certificate: "/foo/bar/client.crt"
ssl.key: "/foo/bar/client.key"
ssl.supported_protocols: "TLSv1.2"
ssl.verification_mode: "none"
Installation example using Ubuntu 20.04 LTS
Here’s a step-by-step example of getting Malcolm from GitHub, configuring your system and your Malcolm instance, and running it on a system running Ubuntu Linux. Your mileage may vary depending on your individual system configuration, but this should be a good starting point.
The commands in this example should be executed as a non-root user.
You can use git
to clone Malcolm into a local working copy, or you can download and extract the artifacts from the latest release.
To install Malcolm from the latest Malcolm release, browse to the Malcolm releases page on GitHub and download at a minimum install.py
and the malcolm_YYYYMMDD_HHNNSS_xxxxxxx.tar.gz
file, then navigate to your downloads directory:
user@host:~$ cd Downloads/
user@host:~/Downloads$ ls
malcolm_common.py install.py malcolm_20190611_095410_ce2d8de.tar.gz
If you are obtaining Malcolm using git
instead, run the following command to clone Malcolm into a local working copy:
user@host:~$ git clone https://github.com/cisagov/Malcolm
Cloning into 'Malcolm'...
remote: Enumerating objects: 443, done.
remote: Counting objects: 100% (443/443), done.
remote: Compressing objects: 100% (310/310), done.
remote: Total 443 (delta 81), reused 441 (delta 79), pack-reused 0
Receiving objects: 100% (443/443), 6.87 MiB | 18.86 MiB/s, done.
Resolving deltas: 100% (81/81), done.
user@host:~$ cd Malcolm/
Next, run the install.py
script to configure your system. Replace user
in this example with your local account username, and follow the prompts. Most questions have an acceptable default you can accept by pressing the Enter
key. Depending on whether you are installing Malcolm from the release tarball or inside of a git working copy, the questions below will be slightly different, but for the most part are the same.
user@host:~/Downloads$ sudo ./install.py
Installing required packages: ['apache2-utils', 'make', 'openssl']
"docker info" failed, attempt to install Docker? (Y/n): y
Attempt to install Docker using official repositories? (Y/n): y
Installing required packages: ['apt-transport-https', 'ca-certificates', 'curl', 'gnupg-agent', 'software-properties-common']
Installing docker packages: ['docker-ce', 'docker-ce-cli', 'containerd.io']
Installation of docker packages apparently succeeded
Add a non-root user to the "docker" group? (y/n): y
Enter user account: user
Add another non-root user to the "docker" group? (y/n): n
"docker-compose version" failed, attempt to install docker-compose? (Y/n): y
Install docker-compose directly from docker github? (Y/n): y
Download and installation of docker-compose apparently succeeded
fs.file-max increases allowed maximum for file handles
fs.file-max= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
fs.inotify.max_user_watches increases allowed maximum for monitored files
fs.inotify.max_user_watches= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
fs.inotify.max_queued_events increases queue size for monitored files
fs.inotify.max_queued_events= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
fs.inotify.max_user_instances increases allowed maximum monitor file watchers
fs.inotify.max_user_instances= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
vm.max_map_count increases allowed maximum for memory segments
vm.max_map_count= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
net.core.somaxconn increases allowed maximum for socket connections
net.core.somaxconn= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
vm.swappiness adjusts the preference of the system to swap vs. drop runtime memory pages
vm.swappiness= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
vm.dirty_background_ratio defines the percentage of system memory fillable with "dirty" pages before flushing
vm.dirty_background_ratio= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
vm.dirty_ratio defines the maximum percentage of dirty system memory before committing everything
vm.dirty_ratio= appears to be missing from /etc/sysctl.conf, append it? (Y/n): y
/etc/security/limits.d/limits.conf increases the allowed maximums for file handles and memlocked segments
/etc/security/limits.d/limits.conf does not exist, create it? (Y/n): y
At this point, if you are installing from the a release tarball you will be asked if you would like to extract the contents of the tarball and to specify the installation directory:
Extract Malcolm runtime files from /home/user/Downloads/malcolm_20190611_095410_ce2d8de.tar.gz (Y/n): y
Enter installation path for Malcolm [/home/user/Downloads/malcolm]: /home/user/Malcolm
Malcolm runtime files extracted to /home/user/Malcolm
Alternatively, if you are configuring Malcolm from within a git working copy, install.py
will now exit. Run install.py
again like you did at the beginning of the example, only remove the sudo
and add --configure
to run install.py
in “configuration only” mode.
user@host:~/Malcolm$ ./scripts/install.py --configure
Now that any necessary system configuration changes have been made, the local Malcolm instance will be configured:
Malcolm processes will run as UID 1000 and GID 1000. Is this OK? (Y/n): y
Setting 10g for OpenSearch and 3g for Logstash. Is this OK? (Y/n): y
Setting 3 workers for Logstash pipelines. Is this OK? (Y/n): y
Restart Malcolm upon system or Docker daemon restart? (y/N): y
1: no
2: on-failure
3: always
4: unless-stopped
Select Malcolm restart behavior (unless-stopped): 4
Require encrypted HTTPS connections? (Y/n): y
Will Malcolm be running behind another reverse proxy (Traefik, Caddy, etc.)? (y/N): n
Specify external Docker network name (or leave blank for default networking) ():
Authenticate against Lightweight Directory Access Protocol (LDAP) server? (y/N): n
Configure OpenSearch index state management? (y/N): n
Automatically analyze all PCAP files with Zeek? (Y/n): y
Perform reverse DNS lookup locally for source and destination IP addresses in Zeek logs? (y/N): n
Perform hardware vendor OUI lookups for MAC addresses? (Y/n): y
Perform string randomness scoring on some fields? (Y/n): y
Expose OpenSearch port to external hosts? (y/N): n
Expose Logstash port to external hosts? (y/N): n
Forward Logstash logs to external OpenSearch instance? (y/N): n
Enable file extraction with Zeek? (y/N): y
1: none
2: known
3: mapped
4: all
5: interesting
Select file extraction behavior (none): 5
1: quarantined
2: all
3: none
Select file preservation behavior (quarantined): 1
Scan extracted files with ClamAV? (y/N): y
Scan extracted files with Yara? (y/N): y
Scan extracted PE files with Capa? (y/N): y
Lookup extracted file hashes with VirusTotal? (y/N): n
Download updated scanner signatures periodically? (Y/n): y
Should Malcolm capture network traffic to PCAP files? (y/N): y
Specify capture interface(s) (comma-separated): eth0
Capture packets using netsniff-ng? (Y/n): y
Capture packets using tcpdump? (y/N): n
Malcolm has been installed to /home/user/Malcolm. See README.md for more information.
Scripts for starting and stopping Malcolm and changing authentication-related settings can be found
in /home/user/Malcolm/scripts.
At this point you should reboot your computer so that the new system settings can be applied. After rebooting, log back in and return to the directory to which Malcolm was installed (or to which the git working copy was cloned).
Now we need to set up authentication and generate some unique self-signed TLS certificates. You can replace analyst
in this example with whatever username you wish to use to log in to the Malcolm web interface.
user@host:~/Malcolm$ ./scripts/auth_setup
Store administrator username/password for local Malcolm access? (Y/n):
Administrator username: analyst
analyst password:
analyst password (again):
(Re)generate self-signed certificates for HTTPS access (Y/n):
(Re)generate self-signed certificates for a remote log forwarder (Y/n):
Store username/password for forwarding Logstash events to a secondary, external OpenSearch instance (y/N):
Store username/password for email alert sender account (y/N):
For now, rather than build Malcolm from scratch, we’ll pull images from Docker Hub:
user@host:~/Malcolm$ docker-compose pull
Pulling api ... done
Pulling arkime ... done
Pulling dashboards ... done
Pulling dashboards-helper ... done
Pulling file-monitor ... done
Pulling filebeat ... done
Pulling freq ... done
Pulling htadmin ... done
Pulling logstash ... done
Pulling name-map-ui ... done
Pulling nginx-proxy ... done
Pulling opensearch ... done
Pulling pcap-capture ... done
Pulling pcap-monitor ... done
Pulling suricata ... done
Pulling upload ... done
Pulling zeek ... done
user@host:~/Malcolm$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
malcolmnetsec/api 6.0.0 xxxxxxxxxxxx 3 days ago 158MB
malcolmnetsec/arkime 6.0.0 xxxxxxxxxxxx 3 days ago 816MB
malcolmnetsec/dashboards 6.0.0 xxxxxxxxxxxx 3 days ago 1.02GB
malcolmnetsec/dashboards-helper 6.0.0 xxxxxxxxxxxx 3 days ago 184MB
malcolmnetsec/filebeat-oss 6.0.0 xxxxxxxxxxxx 3 days ago 624MB
malcolmnetsec/file-monitor 6.0.0 xxxxxxxxxxxx 3 days ago 588MB
malcolmnetsec/file-upload 6.0.0 xxxxxxxxxxxx 3 days ago 259MB
malcolmnetsec/freq 6.0.0 xxxxxxxxxxxx 3 days ago 132MB
malcolmnetsec/htadmin 6.0.0 xxxxxxxxxxxx 3 days ago 242MB
malcolmnetsec/logstash-oss 6.0.0 xxxxxxxxxxxx 3 days ago 1.35GB
malcolmnetsec/name-map-ui 6.0.0 xxxxxxxxxxxx 3 days ago 143MB
malcolmnetsec/nginx-proxy 6.0.0 xxxxxxxxxxxx 3 days ago 121MB
malcolmnetsec/opensearch 6.0.0 xxxxxxxxxxxx 3 days ago 1.17GB
malcolmnetsec/pcap-capture 6.0.0 xxxxxxxxxxxx 3 days ago 121MB
malcolmnetsec/pcap-monitor 6.0.0 xxxxxxxxxxxx 3 days ago 213MB
malcolmnetsec/suricata 6.0.0 xxxxxxxxxxxx 3 days ago 278MB
malcolmnetsec/zeek 6.0.0 xxxxxxxxxxxx 3 days ago 1GB
Finally, we can start Malcolm. When Malcolm starts it will stream informational and debug messages to the console. If you wish, you can safely close the console or use Ctrl+C
to stop these messages; Malcolm will continue running in the background.
user@host:~/Malcolm$ ./scripts/start
In a few minutes, Malcolm services will be accessible via the following URLs:
------------------------------------------------------------------------------
- Arkime: https://localhost/
- OpenSearch Dashboards: https://localhost/dashboards/
- PCAP upload (web): https://localhost/upload/
- PCAP upload (sftp): sftp://username@127.0.0.1:8022/files/
- Host and subnet name mapping editor: https://localhost/name-map-ui/
- Account management: https://localhost:488/
…
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
…
Attaching to malcolm_nginx-proxy_1, malcolm_dashboards_1, malcolm_filebeat_1, malcolm_upload_1, malcolm_pcap-monitor_1, malcolm_arkime_1, malcolm_zeek_1, malcolm_dashboards-helper_1, malcolm_logstash_1, malcolm_freq_1, malcolm_opensearch_1, malcolm_htadmin_1, malcolm_pcap-capture_1, malcolm_suricata_1, malcolm_file-monitor_1, malcolm_name-map-ui_1
…
It will take several minutes for all of Malcolm’s components to start up. Logstash will take the longest, probably 3 to 5 minutes. You’ll know Logstash is fully ready when you see Logstash spit out a bunch of starting up messages, ending with this:
…
logstash_1 | [2019-06-11T15:45:42,009][INFO ][logstash.agent ] Pipelines running {:count=>4, :running_pipelines=>[:"malcolm-output", :"malcolm-input", :"malcolm-zeek", :"malcolm-enrichment"], :non_running_pipelines=>[]}
logstash_1 | [2019-06-11T15:45:42,599][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
…
You can now open a web browser and navigate to one of the Malcolm user interfaces.
Major releases
The Malcolm project uses semantic versioning when choosing version numbers. If you are moving between major releases (e.g., from v4.0.1 to v5.0.0), you’re likely to find that there are enough major backwards compatibility-breaking changes that upgrading may not be worth the time and trouble. A fresh install is strongly recommended between major releases.